Intelligence for Embedded Systems by Cesare Alippi

Intelligence for Embedded Systems by Cesare Alippi

Author:Cesare Alippi
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham


Fig. 7.2The neural network data flow ported on a 8 bits architecture. The description of the flow is similar to that given in the caption of Fig. 7.1 with the suitable change of word length for the involved entities

The data flows associated with the two 16 bits and 8 bits implementations are shown in Figs. 7.1 and 7.2, respectively. The complete description of the architectural operations is detailed in the caption of Fig. 7.1.

The 16 bit implementation uses the same representation for inputs, output, and weights. With such a choice, we are safe from any possible occurrence of overflows. The sum following the product of the generic input by the corresponding weight is still performed at the full resolution of bits. Differently, in the 8 bits implementation, we adopt different resolutions for inputs, outputs, and weights (input and output values are represented using a coding, while weights are represented with a one). The neuron biases are still represented at the full resolution of . After the operations shifts are introduced to bring back the obtained numbers to the envisaged word-length. The embedded code for implementing the neural activation value ready to feed the nonlinear activation function of the hidden layer is given in Listing 7.1.

The evaluation of the nonlinear hyperbolic tangent function is particularly costly from the computational point of view. To deal with the issue, the best solution is to rely on a Look Up Table (LUT) memory enumerating the input–output relationship in correspondence of some points. The input of the memory is the neural activation value (i.e., the scalar product between the neuron inputs and the associated weights), the output is the content stored in the memory cells which represents the value the activation function assumes in correspondence of the input value. We aimed at keeping the size of the LUT as small as possible. As such, the input values where coded as unsigned values for a total of a 64 cells LUT memory; the output values follow the encoding used for the inputs and, as such, depend on the chosen architecture ( and for the 16 and 8 bits architectures, respectively). We comment that the input of the LUT is unsigned. The reason is that the hyperbolic tangent (Th) is an odd function for which . As such, we can represent the inputs by uniformly subdividing the interval (values above input 4 provide a saturated output at 1): there is no need to represent the full interval with an immediate memory saving.

The approximation ability of the solution is given in Fig. 7.3 for the coding of the output.

The embedded code implementing the LUT is shown in Listing 7.2. As it can be noted in Fig. 7.3, the quantization error does not introduce a bias in the approximated function, since we opted for a rounding of the reduced argument instead of a simple truncation of the value. The cost of an extra shift and the sum needed to implement the rounding operator is well compensated by the disappearing of the bias term in the approximating the hyperbolic tangent function.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.